Bloom Filters, Adaptivity, and the Dictionary Problem

نویسندگان

  • Michael A. Bender
  • Martin Farach-Colton
  • Mayank Goswami
  • Rob Johnson
  • Samuel McCauley
  • Shikha Singh
چکیده

The Bloom filter—or, more generally, an approximate membership query data structure (AMQ)— maintains a compact, probabilistic representation of a set S of keys from a universeU . An AMQ supports lookup, and possibly insert and delete operations. If x ∈ S, then lookup(x) returns “present.” If x 6∈ S, then, lookup(x) may return “present” with probability at most ε, where ε is a tunable false-positive probability, and such an x is called a false positive of the AMQ. Otherwise lookup(x) returns “absent.” AMQs have become widely used to accelerate dictionaries that are stored remotely (e.g., on disk or across a network). By using an AMQ, the dictionary needs to access the remote representation of S only when the AMQ indicates that the queried item might be present in S. Thus, the primary goal of an AMQ is to minimize its false-positive rate, so that the number of unnecessary accesses to the remote representation of S can be minimized. However, the false-positive guarantees for AMQs are rather weak. The false-positive probability of ε holds only for distinct or randomly chosen queries, but does not hold for arbitrary sequences of queries. For example, an adversary that chooses its queries based on the outcomes of previous queries can easily create a sequence of queries consisting almost entirely of false positives. Even simply repeating a randomly chosen query has an ε chance of producing a sequence entirely of false positives. In this paper, we give adaptive AMQs that do have strong false-positive guarantees. In particular, for any fixed ε, our AMQs guarantee a false-positive rate of ε for every query and for every sequence of previously made queries. Furthermore, our adaptive AMQ is optimal in terms of space (up to lower order terms) and complexity (all operations are constant time). This research was supported in part by NSF grants CCF 1114809, CCF 1217708, CCF 1218188, CCF 1314633, CCF 1637458, IIS 1247726, IIS 1251137, CNS 1408695, CNS 1408782, CCF 1439084, ccf-bsf 1716252, CCF 1617618, IIS 1541613, and CAREER Award CCF 1553385, as well as NIH grant 1U01CA198952-01, by the European Research Council under the European Union’s 7th Framework Programme (FP7/2007-2013) / ERC grant agreement no. 614331, and by Sandia National Laboratories, EMC, Inc, and NetAPP, Inc. ∗Stony Brook University, Stony Brook, NY 11794-4400, USA. Email: {bender,shiksingh}@cs.stonybrook.edu. †Rutgers University, Piscataway NJ 08855, USA. Email: [email protected]. ‡Queens College, CUNY, New York, USA. Email: [email protected]. §VMware Research, Creekside F, 3425 Hillview Ave, Palo Alto, CA 94304. Email: [email protected] ¶IT University of Copenhagen, Copenhagen, Denmark. Email: [email protected].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Membership of Sets: A Survey

The task of representing a set so as to support membership queries is a common one in computer science. If space is no object, a complete dictionary may provide accurate QUERYing in good runtime. If space is at a premium and the size of the set uncertain, Bloom filters and hash compaction provide good approximations. If the size is completely unknown, a modified Bloom filter provides very good ...

متن کامل

Bloom filters and their applications

The bloom filters, as a new approach to hashing, were firstly presented by Burton Bloom [Blo70]. He considered the task of presenting a set as a sequence of bits, which is called hash code of the set. In comparison with the conventional hash-coding method, author offered another technique allowing to get more concise presentation of the set, but with some errors. There are applications in which...

متن کامل

A Cuckoo Filter Modification Inspired by Bloom Filter

Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...

متن کامل

Bloom Cookies: Web Search Personalization without User Tracking

We propose Bloom cookies that encode a user’s profile in a compact and privacy-preserving way, without preventing online services from using it for personalization purposes. The Bloom cookies design is inspired by our analysis of a large set of web search logs that shows drawbacks of two profile obfuscation techniques, namely profile generalization and noise injection, today used by many privac...

متن کامل

An Optimal Bloom Filter Replacement Based on Matrix Solving

We suggest a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters. The space requirements of the dictionary we suggest are much smaller than those of a hashtable. We allow storing n keys, each mapped to value which is a string of k bits. Our suggested method requires nk + o(n) bits space to store the dictionary, and O(n) time to produce the d...

متن کامل

Bloofi: Multidimensional Bloom Filters

Bloom filters are probabilistic data structures commonly used for approximate membership problems in many areas of Computer Science (networking, distributed systems, databases, etc.). With the increase in data size and distribution of data, problems arise where a large number of Bloom filters are available, and all them need to be searched for potential matches. As an example, in a federated cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1711.01616  شماره 

صفحات  -

تاریخ انتشار 2017